Bufferless NOC Simulation of Large Multicore System on GPU Hardware

نویسندگان

  • Navin Kumar
  • Aryabartta Sahu
چکیده

Last level cache management and core interconnection network play important roles in performance and power consumption in multicore system. Large scale chip multicore uses mesh interconnect widely due to scalability and simplicity of the mesh interconnection design. As interconnection network occupied significant area and consumes significant percent of system power, bufferless network is an appealing alternative design to reduce power consumption and hardware cost. We have designed and implemented a simulator for simulation of distributed cache management of large chip multicore where cores are connected using bufferless interconnection network. Also, we have redesigned and implemented the our simulator which is a GPU compatible parallel version of the same simulator using CUDA programming model. We have simulated target large chip multicore with up to 43,000 cores and achieved up to 25 times speedup on NVIDIA GeForce GTX 690 GPU over serial simulation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design of a novel congestion-aware communication mechanism for wireless NoC architecture in multicore systems

Hybrid Wireless Network-on-Chip (WNoC) architecture is emerged as a scalable communication structure to mitigate the deficits of traditional NOC architecture for the future Multi-core systems. The hybrid WNoC architecture provides energy efficient, high data rate and flexible communications for NoC architectures. In these architectures, each wireless router is shared by a set of processing core...

متن کامل

Making-a-stop: A new bufferless routing algorithm for on-chip network

In the deep submicron regime, the power and area consumed by router buffers in network-on-chip (NoC) have become a primary concern. With buffers elimination, bufferless routing is emerging as a promising solution to provide power-and-area efficiency for NoC. In this paper, we present a new bufferless routing algorithm that can be coupled with any topology. The proposed routing algorithm is base...

متن کامل

Cycle-Accurate 64-Core FPGA-Based Hybrid Simulator

Nowadays, computer architecture researches mainly focus on the multicore hardware and software design. As compared with the traditional uniprocessor counterpart, the system complexity of multicore simulators is dramatically augmented, which is spurred by the increase in core number. Full-system fidelity, fast simulation speed, and cycle-level accuracy are the essential requirements of the advan...

متن کامل

Ephedrine QoS: An Antidote to Slow, Congested, Bufferless NoCs

Datacenters consolidate diverse applications to improve utilization. However when multiple applications are colocated on such platforms, contention for shared resources like networks-on-chip (NoCs) can degrade the performance of latency-critical online services (high-priority applications). Recently proposed bufferless NoCs (Nychis et al.) have the advantages of requiring less area and power, b...

متن کامل

Automatic Embedded Multicore Generation and Evaluation Methodology: a Case Study of a NOC Based 2400-cores on Very Large Scale Emulator

Future generation embedded multicore will be based on hundreds of processors connected through Network on Chip (NOC) . Design productivity of embedded multicore is a major challenge for the semiconductor industry. In this paper, an automatic very large scale NoC design methodology based on FPGA IP is proposed to accelerate the embedded multicore design productivity using very large scale multi-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1508.03235  شماره 

صفحات  -

تاریخ انتشار 2015